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ABSTRACT 


The median webpage has increased in size by more than 80% in 
the last 4 years. This extra complexity allows for a rich browsing 
experience, but it hurts the majority of mobile users which still pay 
for their traffic. This has motivated several data-saving solutions, 
which aim at reducing the complexity of webpages by transforming 
their content. Despite each method being unique, they either reduce 
user privacy by further centralizing web traffic through data-saving 
middleboxes or introduce web compatibility (Web-compat) issues 
by removing content that breaks pages in unpredictable ways. 

In this paper, we argue that data-saving is still possible without 
impacting either users privacy or Web-compat. Our main observa- 
tion is that Web images make up a large portion of Web traffic and 
have negligible impact on Web-compat. To this end we make two 
main contributions. First, we quantify the potential savings that 
image manipulation, such as dimension resizing, quality compres- 
sion, and transcoding, enables at large scale: 300 landing and 880 
internal pages. Next, we design and build BrowsELITE, an entirely 
client-side tool that achieves such data savings through opportunis- 
tically instrumenting existing server-side tooling to perform image 
compression, while simultaneously reducing the total amount of 
image data fetched. The effect of BRowsELITE on the user experi- 
ence is quantified using standard page load metrics and a real user 
study of over 200 users across 50 optimized web pages. BROWSELITE 
allows for similar savings to middlebox approaches, while offering 
additional security, privacy, and Web-compat guarantees. 
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1 INTRODUCTION 


A multitude of studies from academia and industry alike suggest up- 
trends in the complexity and size of the mobile Web [9, 11, 15, 19, 39], 
so much so, that the median page now has reached 2 MB, up 
over 80% from 2016 [15]. While this complexity has undoubtedly 
created a richer browsing experience, it does cause downsides for 
mobile users. Byte-heavy pages are responsible for frustratingly 
slow browsing experiences on slower networks, along with sig- 
nificant monetary costs for users on limited mobile data plans. In 
Canada, for example, the median webpage costs $0.24 to load [34]. 

As a result, emphasis has been placed on changing Web brows- 
ing to consume less data [9, 11, 44, 46, 51]. While there has been 
some public confusion over the exact workings and reach of such 
data-saving methods [33], recent studies of their actual implemen- 
tations [31, 37, 54] reveal a few key shortcomings. 

First and foremost, these solutions impose various privacy con- 
cerns to their users when compared to regular Web browsing. Some 
are deployed as middlebox services which either transparently 
proxy the user’s unencrypted traffic [11, 51], or, apply URL redirec- 
tion [23] or Man-in-the-Middle proxies [44, 46, 58] to also operate 
on encrypted traffic (HTTPS) [37, 54]. Given the rise of HTTPS [28], 
the former sees limited use, while the latter breaks the end-to-end 
principles of TLS, exposing potentially private or personalized Web 
contents to third parties [37, 54]. 

Further, the exact measures these systems take to actually save 
data are often cryptic [31, 33, 37, 54]. As determining Web-compat 
issues, or the ability to quantify broken webpages, is an open prob- 
lem that requires large amount of manual effort [41], it is hard to 
determine how and when these solutions actually break webpages; 
though when they do so there is usually public outcry [12]. 

The goal of this work is to devise a data-saving solution which 
solves the above shortcomings. Our intuition is twofold: first, such 
a solution needs to be client-side in order to eliminate the privacy 
and reach concerns of middlebox approaches. Namely, a client- 
side only approach has the potential to save data for personalized 
webpage content without exposing it to third parties. Second, by 
being image-centric it can impose virtually no impact on Web- 
compat, in comparison to, for example, JavaScript code elision [22, 
23, 44]. As the median webpage is comprised of 900 KB of images 
(or about 44% of its total size) [15], an image-based solution still 
has high potential for data-savings. 

Many of the aforementioned middlebox techniques [11, 51] auto- 
matically apply popular image manipulation techniques (resizing, 
quality compression, and transcoding) to save data before the last 
mile (see Section 2). While the weight of images across webpages 
is known to be quite high, it is unclear which fraction of pages in 
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the wild can benefit from such techniques. Our first contribution 
is to quantify the impact such middlebox techniques have when 
optimizing images. This quantification represents an upper bound 
of the data savings which a more private client-side solution can 
potentially obtain. We analyze such techniques by compressing 
webpages across a crawl of size ~1.2k (300 landing and 880 internal 
webpages), in which we find 21% of total page weight can be saved 
at the median, and up to 90% at the 95th percentile (depending on 
webpage rank and presence of cold or hot caches). 

Motivated by this availability for image savings, we propose 
BROWSELITE to optimize images in a two-fold, more private, fash- 
ion. First, BROWSELITE takes advantage of existing compression 
technologies typically found on Web servers [32], instrumenting 
them from the client at run-time. BROWSELITE uses information 
from images in the Document Object Model (DOM) to force their 
compression while balancing their visual quality. This component 
of BROWSELITE, URL REwRITING, identifies images as candidates for 
compression using simple URL rewriting rules, which enable 70% 
reduction of image sizes on ~16% of webpages from our crawls. 

From there, BROWSELITE takes a second approach to save-data 
for more general pages. This approach, which we call IMAGE FETCH 
REDUCTION, uses a component of the HTTP standard known as 
range requests to actually fetch less data from all Web images, 
thereby reducing their network data usage. To alleviate the im- 
pact on the user’s quality of experience (QoE) caused by rendering 
only the requested portions of images, BRowsELITE differentiates 
between two standard image types on the Web, baseline and pro- 
gressive images, during the page load. We show progressive images 
can be rendered almost completely with only a fraction of their 
data requested. For baseline images, we introduce our own tech- 
nique, reflection, to make empty spaces on the webpage from these 
partial images appear less visually broken. We also show such tech- 
niques incur a modest trade-off in user QoE using both systematic 
metrics [16] and real user studies across 50 of our pages with 200 
crowdsourced users (See Section 5). 

Our experiments show that BrRowsELITE improves the band- 
width consumption of pages, taken from the same ~1.2k webpages 
of our crawls, by 25% at the median. While the primary function 
of BROWSELITE is to save data, our experiments on the pages of 
our crawls loaded over two network conditions show little effect, 
or even slight improvements, to the SpeedIndex [16] of 80% of 
pages, while causing modest overheads (<500ms) for the latter 20%. 
Further, our comparisons of BROWSELITE to existing middlebox 
image reduction approaches show that savings from BROWSELITE 
(Section 5) outreach those offered via plain HTTP proxies, and are 
competitive with MITM middleboxes. While our comparisons to a 
real data-saving system which optimizes full contents of webpages, 
Google Web Light [23], do show the high potential of such tools to 
save data, we believe BROWSELITE to be a competitive alternative 
given its higher privacy, security, and Web-compat guarantees. 


2 BACKGROUND AND RELATED WORK 


The research community has dedicated significant effort to design 
page load optimizations. These works [36, 42, 49, 57] aim at speeding 
up the load times of Web pages by optimizing the order of Web 
object retrieval over the network. Such reordering schemes require 
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server-side knowledge or assistance, and/or are unlikely to help in 
terms of bandwidth savings [44]. 

Aside from load performance, data-savings is also a largely ex- 
plored topic, with several commercial solutions already available 
(e.g., [23, 46]). State of the art data-saving methods can be cate- 
gorized as: 1) middlebox transformations, 2) server-side resource 
optimizations, and 3) entirely client-side data saving approaches. 
In the following, we discuss each category in detail. 


Middlebox Transformations. These methods rely on transpar- 
ent HTTP proxies or middleboxes [11, 39, 51] to offer data savings 
via resource transformations. By operating transparently on path, 
they do not require server-side support. However, they cannot be 
used in presence of encrypted content (HTTPS) which is nowadays 
used by most websites [28, 55]. Popular resource transformations 
adopted by middleboxes include plain text gzip compression, image 
downsizing from its native dimensions to its rendered dimensions 
in the client’s viewpoint, and content transcoding to formats which 
offer higher compression, e.g., WebP. 

Flywheel [11], Google’s Web compression proxy, is perhaps the 
seminal work in the data-saving space. This proxy claims up to 80% 
data savings via transcoding of image formats, and gzip compres- 
sion of JavaScript and CSS. FlexiWeb [51] is a follow-up implemen- 
tation of Flywheel which leverages machine learning to optimize 
the trade-off between data savings and user quality-of-experience 
(QoE). Work from Alibaba [39] extends Flywheel’s techniques to 
any mobile application by intercepting (unencrypted) mobile traffic. 

Given the near-ubiquitous adoption of end-to-end encryption [28, 
55] the potential for implementing such transformations via trans- 
parent middleboxes remains in question. Further, privacy is of con- 
cern to these methods, given that they require third party access to 
contents of webpages, much of which may be personalized. 


Server-Side Resource Optimizations. These methods rely on 
some server side support to allow the clients to only fetch a min- 
imalist version of the page [9, 18, 22, 29, 44, 46]. The methods 
proposed include removal and reorganization of various Web ob- 
jects from the page [9, 44, 46], detection and elision of unnecessary 
code [18, 22, 29], and URL rewriting based on content similarity to 
enable smarter and more effective caching at the client [43]. To best 
understand which portions of pages to remove, or which content to 
rewrite, requires knowledge of the page state gained by completely 
loading the webpage before it is sent to the client, which is why 
these solutions require server-side support. 

To allow for data-savings without explicit server-side control, 
many of the above methods [44, 46] can be implemented as man- 
in-the-middle proxies which either break TLS or leverage URL 
redirection, i.e., serve other Web page contents directly from their 
servers. For example, Google’s Web Light [9] redirects the initial 
request for a webpage through its servers, without sharing the 
page’s cookies to Google’s servers. While this is a plus for privacy, 
as no client-side state can be inferred, it limits the reach of the 
approach, given that personalized content cannot be optimized [54]. 
Further, such implementations leak information about which URLs 
are requested to third party servers, providing an opportunity to 
build full browsing profiles of end-users [54]. Recent work [37] has 
highlighted the above privacy risks of such approaches, and has also 
shown that many current implementations are built on outdated 
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Figure 1: Room for bandwidth savings by adjusting image sizes. Shown in (a) and (c) are CDFs of potential raw and normalized 
savings respectively across pages of different ranks from our crawls. Normalization in (c) are against page weights with and 
without browser caching. As shown in (b), pages in the 90th percentile see room for over 7MB of savings. 


software, use substandard TLS certificate validation, and/or use 
weak TLS cipher suites, opening up users to additional security 
risks. 

Last but not least, these methods often make use of complicated 
rules for replacing/removing JavaScript [9, 22, 29, 58]. Other so- 
lutions label dead code based on offline randomized user input 
testing [18]. This implies that the efficacy of such solutions in terms 
of Web-compat remains quite uncertain and hard to measure, often 
requiring much manual effort to quantify [41]. Furthermore, when 
pages do break, there is often backlash from users and Web devel- 
opers alike due to the lack of transparency of these systems [12]. 


Client-Side Only Solutions. These methods run fully in the 
client, with no server support (either directly or via redirection) 
or support of on path middleboxes. While this approach offers the 
highest privacy guarantees, it is limited in which data-saving strate- 
gies can be adopted, since the actual contents of Web pages are 
unknown to clients until retrieved. 

Content blocking is the most common client-side data saving solu- 
tion. This strategy simply blocks resources which can be identified 
as non-useful to users at the time of their request, such as advertise- 
ments in the case of ad-blocking. While more complicated solutions 
that block potentially useful page components, e.g., JavaScript, are 
available [22], they require apriori knowledge of page contents 
gained from observing the page load over a period of time. 

Content blocking is also made available by Chromium under its 
Data-Saving mode to block all Web page images [40], replacing 
them with a single placeholder image. Image blocking saves users 
data, but it has drastic impact on the user experience of Web pages. 
While users are allowed to download these images individually, 
they have little to no context as to which images may be important 
to them. BrowsELitE balances data-savings and the user’s brows- 
ing QoE; further, no user action is required. Our user study using 
BROWSELITE revealed that users tend to rate pages without images 
as completely broken (1 on a 1-5 scale at the median), whereas 
pages with data-optimized images received much more favorable 
responses (3 at the median). 

The work in [59] outlines what can be done from the client in 
terms of page load optimization, but focuses mainly on speculative 


caching (similar to [43]), as well as content pre-fetching to improve 
latency rather than data-savings. While caching clearly reduces 
data sent over the network, it does not offer data savings under 
cold connections, which the same work stresses the prevalence and 
importance of [59]. BRowsELITE is primarily designed to save users 
data under cold caches, though our results (Section 5) highlight 
how BrowsELITE will not waste data in presence of hot caches. 
Further, recent works have shown security implications for too 
liberal caching policies, such as enforcing the caching of similar 
content between domains [53]. 


3 POTENTIAL FOR IMAGE SAVINGS 


We begin our measurement by determining the potential for savings 
made available through less private, middlebox based optimization 
of images. Our analysis serves the purpose of creating a baseline for 
which to compare the data-savings of our more private, exclusively 
client-side, BRowsELITE. While the weight of images across web- 
pages is known to be quite high (44% as per HTTP Archive) [15], 
it is unclear how much of these images could be saved using the 
middlebox approaches of image resizing, quality compression, and 
transcoding (see Section 2). 


Methodology. We resort to Web crawling [56] to collect a repre- 
sentative dataset on the current status of image usage in the wild. 
To obtain a set of domains to crawl, we use the Majestic list [1] 
which contains the top million domains with the most referring 
subnets. We chose the Majestic list, as opposed to the more popular 
Alexa list [2], as it is a free alternative that is still exclusively based 
on Web browser traffic [50]. 

We crawl 3 buckets of webpage rankings from the Majestic list 
(top100, apr50k, and apr100k) [1]. For each bucket, we select the 
first 100 websites, e.g., pages 100-200 for the top100 and pages 
ranked 50,000-50,100 in the apr50k bucket. Given the importance 
of covering internal pages [14, 56], we crawl the first 10 links to 
the same domain, if available, from each top level domain. Our 
crawls originated a data-set encompassing ~1.2k (300 landing and 
880 internal pages) with, on average, 3 internal pages per domain. 

To perform the crawls by scripting Chromium version 83 using 
a combination of tooling via Lighthouse [3] and Puppeteer [8]. We 
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Figure 2: The resulting trade-offs in image quality, mea- 
sured via SSIM, across the different levels of potential sav- 
ings from Figure 1. 


use Lighthouse to load a page and collect various network and in- 
page statistics, such as network bytes associated with each resource 
requested, and the final rendered locations and sizes associated with 
all images from the Cascading Style Sheets (CSS) of the webpage. 
We use Puppeteer to obtain the response bodies of all network 
requests, and to scroll through the page to capture information of 
images which may be lazy loaded, or those added only in presence 
of user interactions. We ensure each webpage (landing and internal) 
is loaded using a cold cache. However, we also save the necessary 
HTTP headers (i.e. Cache-Control, Expires, Last-Modified, 
and Etag), to implement browser logic for determining whether a 
given request is cacheable [20]. This allows us to simulate the load 
of these pages with caching enabled, offline after our crawls. This 
analysis is particularly important for internal pages, where many 
resources may be cached from the landing page. 

Lighthouse currently provides rough estimates of wasted bytes 
due to a page failing to implement standard image data-saving 
optimizations (See Section 2. For our measurements, we extend 
these estimates in two ways. In the following, we detail both exten- 
sions to Lighthouse reports and, for each, provide an analysis of 
the extensions based on the ~1.2k websites from our crawl. 


Image compression pipeline. We estimate the potential data 
waste in images by manipulating all the HTTP response bodies of 
image requests through 3 image optimization techniques as recom- 
mended by Lighthouse and employed by proxy based approaches 
(e.g., Flywheel [11]), that is, image resizing, quality compression, 
and transcoding. Our first extension to Lighthouse data-saving mea- 
surements is to pipeline images through these 3 optimizations, as 
Lighthouse currently only applies these individually. This approach 
can underestimate savings as these optimizations compound to save 
data [11, 51]. We employ two versions of the pipeline, standard and 
extreme, which have trade-offs in terms of savings versus potential 
impact on the user’s QoE. To quantify the reduction in QoE caused 
by image savings, we use the structural similarity metric [60] (SSIM), 
a full image reference metric which is commonly used to measure 
quality degradation in images due to transformations (blurring, 
compression, color reduction, etc.). 

For the resizing component of the pipeline, in standard mode, 
we resize the image as sent over the network to its CSS attributes 
width and height; for extreme mode we resize the image to half of 
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these values. We note that the CSS attributes depend on the size 
of the viewport, and hence smaller viewports may achieve higher 
relative savings for the same image at the same perceptual quality. 
In all our experiments, we instrument Chromium to emulate a 
Pixel 2 which has viewport size of 411x731 (5.5 inches), or the most 
popular size in 2019 [17]. Following image resizing, all jpeg, tiff, png, 
bmp, and gif images are transcoded to WebP, a “next generation” 
image format which offers higher compression with visual quality 
comparable to the other formats [13]. In standard mode WebP 
images are compressed by reducing their quality setting to 85 (out 
of 100), as this is reported as the best trade-off between savings and 
SSIM degradation according to previous works [11, 51]. In extreme 
mode, we aggressively reduce image qualities to 10 (out of 100). 

Results from the image compression pipeline are shown in Fig- 
ure 1 (a) and (b). Pictured first is the CDF of savings, in KBytes, for 
images across pages from the top100 and apr50k websites. What we 
can observe is that lower ranked pages see generally higher savings, 
with medians of 152KB, 292KB, and 308KB for standard mode of 
top100, apr50k and apr100k respectively, and 189KB, 375KB, 390KB 
for extreme mode on the same. This result is intuitive, as more pop- 
ular pages are expected to be more optimized. While the savings 
offered for the median page is rather modest, the distribution of 
savings is quite long-tailed, as shown in Figure 1 (b). The figure 
shows that even top ranked pages see savings of over 3MB at the 
95th percentile. Further, the gap between the standard and extreme 
modes is modest, being at most 100KB at the median page across 
the ranks. This is likely because savings offered through resizing 
and reducing quality begin to diminish when configured further 
past that of our standard level. 

Outside of raw bytes, we compare savings as a fraction of the 
total page weight. Figure 1 (c) provides the normalized savings, 
for our standard mode, across our crawls broken up by page rank. 
As before, the lower ranked pages offer more potential savings, 
with 10.3%, 20.1%, 21.9% of the median page being saved for top100, 
apr50k, and apr100k respectively. This jumps to 30.8%, 38.9%, and 
44.5% for in the 75th percentile across the respective crawls. 

We also analyze the savings normalized by the weight of pages 
under caching. Specifically, we do not count URLs (including im- 
ages) that are a) found on both top level pages and second-level 
pages, and b) marked as cacheable by their HTTP headers [20] 
towards the total weight and savings of second-level pages. Note 
that we assume a double-keyed HTTP cache [26, 30]. It follows that 
when determining repeat URLs, the same image on different do- 
mains will be counted for savings across both pages. Figure 1 (c) also 
shows this result, where we can observe that savings increase in the 
presence of warm caches; specifically savings of up to 14.2%, 21.6%, 
and 36.8% are observed for the median page in top100, apr50k, and 
apr 100k respectively. This is because many Web resources, other 
than images, are shared between landing pages and inner pages, 
thus increasing the relative weight of data-saving techniques. 

Moving to SSIM analysis, we observe that standard mode pro- 
vides rather mild QoE trade-offs, while the trade-off for the extreme 
mode is harsher. Figure 2 shows paired boxplots of the resulting 
SSIM metrics of images, optimized by both standard and extreme 
modes and bucketed by their resulting percentile of data-savings. 
While the median SSIM of images for all levels of savings in standard 
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Figure 3: Log scaled difference in savings (KBytes) when (a) 
not considering CSS background images and (b) when fail- 
ing to appropriately calculate savings for CSS sprites. 


mode does not fall below .95 (.83 for the 25th percentile), the me- 
dian of extreme mode sits at .74 across all levels of savings. Finally, 
we also observed that images that benefit more from data-savings 
see lesser reductions to their visual quality. This analysis suggests 
that high data-savings are possible with modest QoE impact, with 
standard mode offering the more preferable trade-off. 


CSS sprites. Lighthouse ignores potential savings from resizing im- 
ages which are embedded by CSS, or CSS background images. This 
is because a typical use for background images is CSS sprites [25], 
or images that consist of multiple smaller images embedded in one 
parent file (or sprite sheet). The sprite sheet is then dynamically 
cropped by the webpage to render the component images. Mea- 
suring savings for CSS sprites correctly requires accounting for 
the final size and location of all sprites, not just the original image. 
Due to this complexity, Lighthouse currently ignores calculating 
data-savings for all background images, thus leading to potential 
savings underestimations. 

We identify CSS sprites, or more generally any images that are 
cropped dynamically using the CSS property background-position, 
and separate these from normal CSS background images. We com- 
pute savings for normal background images using the pipeline 
previously discussed in this section. To calculate savings for CSS 
sprites, we compare the total area used by sprite sheets on the 
page with the total area of these images as sent over the network. 
Figure 3 (a) shows the Cumulative Distribution Function (CDF) of 
the data-savings which are currently missed by Lighthouse due 
to ignoring background images. Figure 3 (b) shows a CDF of the 
overestimation in savings seen from incorrectly resizing entire CSS 
sprite sheets to their component sprite size. The CDFs refer to the 
full set of pages from our dataset, and show the change in KBytes 
of savings in log scale. The graphs start at (a) 0.875 and (b) 0.6 for 
visibility as only 12.5% and 40% of pages were affected for (a) and 
(b) respectively. While not accounting for background CSS images 
only misses < 50 KB of savings for 88% of pages, there are 5% of 
pages for which > 100 KB are missed. Resizing the CSS sprite sheet 
to the size of a single sprite can cause an overestimation of 100 KB 
and 2 MB of savings for the pages at the 60th and 90th percentile, 
respectively. 
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4 IMPLEMENTATION 


Measurements from our previous section are motivating evidence 
that image optimizations contribute to significant bandwidth sav- 
ings for Web browsing, especially in the upper tail of lesser-optimized 
pages which see savings on the order of MBytes. However, such 
savings represent a bound for image savings attained by either 
privacy invasive approaches, e.g., using middleboxes, TLS intercep- 
tion, URL rewriting, or solutions which require some form of server 
collaboration. 

In this section, we introduce BRowsELITE, a collection of tech- 
niques which realize image data-savings directly at the client (e.g., 
a browser), thus offering higher privacy guarantees. BROWSELITE 
consists of two main techniques, URL REwRITING and IMAGE FETCH 
REDUCTION, which are both described in the following. 


4.1 Image Server Instrumentation from the 
Client 


The key challenge for a client-side data saving approach is that 
it cannot apply transformations in the same way as middleboxes; 
when images are received by the browser, the user’s data has al- 
ready been wasted. Instead, it requires a way to ask the server 
for a more compressed version of images. Our intuition is that 
this is possible thanks to the recent widespread of image services 
run by popular content delivery networks (CDNs) like Fastly [4], 
Akamai [5], and CloudFlare [6]. These image services offer similar 
means as middleboxes to reduce image file sizes. Such savings are 
typically made accessible by configuring parameters in the URL of 
the image being served by the CDN. 

However, these image services are not always configured opti- 
mally. For one, images uploaded to these services may be resized in 
a “one-size fits all” manner, for convenience, even though mobile 
pages, for instance, are accessed from a large diversity of device 
types and screen sizes. For smaller devices, this implies images can 
be further resized without observable quality degradation. Further, 
browser fragmentation means modern image formats (e.g., WebP 
and AVIF) are not always supported. For these reasons, image ser- 
vices can be configured in an overly conservative manner to not 
deliver these modern formats, missing out on significant savings 
(see Figure 1). Lastly, image services may be configured to deliver 
images under higher visual quality (as determined, for example, 
by SSIM discussed in Section 3), rather than higher data-savings. 
However, many image formats are able to keep much of the same 
visual quality at even a significant level of compression, e.g., jpeg 
and WebP images provide no noticeable visual quality reduction at 
85% compression levels [11] (See Section 3). As we will show later 
(see Table 4), these configurations of image services can miss up to 
70% savings across real pages. 

We design BRowSELITE to detect the use of such image ser- 
vices, and to uncover potential data-savings in their configurations. 
Specifically, BRowsELITE detects whether or not an image server 
supports the same transformations as the standard mode outlined 
in Section 3, that is, CSS right-sizing, quality reduction, and for- 
mat transcoding. If such support is detected, BRowsELITE modifies 
HTTP requests for these images in real-time to automatically apply 
such transformations, thereby optimizing bandwidth consumption. 
We call this component of our work URL REwRITING. 
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To identify the presence or lack of an image service associated 
with a given image, BROWSELITE searches for parameters in the URL 
of images that might related to the image’s dimension, compression 
level, and format. For example, in the URL}, the parameters w_4QQ, 
q_10Q, and extension . jpg correspond to the actual dimensions, 
quality compression level (out of 100), and format of the downloaded 
image. The equality of the image data and the URL parameters 
suggests that the image can be dynamically resized, compressed, 
and transcoded just by changing such parameters on the fly. 

Editing URL parameters is not without risk, as a URL may simply 
be statically defined with no image service available, and should 
thus not be edited. This manner of false positive can, at the very 
least, cause an extra unnecessary round trip and mitigate bandwidth 
savings, and at the worst actually hurt bandwidth savings. 

To avoid latency and bandwidth harming requests as much as 
possible, BROWSELITE takes a two step approach to URL REWRITING. 
First, we rewrite any value in the URL that matches a native size, 
quality, or format property of the image. Intuitively, this method 
achieves high true positive rate, but also high false positive rate. 
Thus, second, we generate a series of rules from the true positives 
to increase precision; instead of rewriting any location in the URL 
matching a property, we only rewrite URL parameters matching 
such rules (e.g., w_ in our example above). We further extend these 
rules by manually exploring mobile vs. non-mobile versions of 
the page and the image service APIs of 12 popular CDNs?. We 
create such rules across the ~10k images obtained from our crawls 
outlined in Section 3. In the end we chose a subset of rules for 
matching which gave the best trade off in terms of true positive 
and true negative rates for all our observed images. 

Figure 4 summarizes the final results of the URL REwrITING 
process. We can observe that a total of 50 unique rules were utilized 
which affect 16.1% of images for an error rate of 7.2%. Breaking 
down the error rate, 6.6% of images returned a HTTP 404 status code, 
implying a need for a single re-fetch when running BROWSELITE. 
The latter 0.6% returned a size greater than the original size. Overall, 
of the affected images, 69.9% of the original image size was saved 
on average. We discuss the impact URL RewriTInc has on page 
data-savings in the following section. 


4.2 Fetching Less with Range Requests 


The HTTP standard outlines the ability to request arbitrary bytes 
of an HTTP response over the network. This is achieved by an 
HTTP range header being attached to a request which indicates the 
amount of bytes of the resource that should be sent from the server. 
Assuming this capability is properly implemented at the server, a 
206 status is returned for the request, with the subset of bytes re- 
quested as the response body. As Web image formats were designed 
to display render even under slow or lossy networks, not only can 
images be partially requested, but major image codecs (e.g., libpng, 
libjpeg, libwebp) support partial rendering as well. As 96% of the 
servers in our crawls supported range requests, our rationale is to 
combine the use of range requests with URL REwRiTING to achieve 
savings on a more general set of webpages. We call the process of 


'https://static.wixstatic.com/media/98a2de_37749ccfe79f48d1a977af77d1c2bd0e~mv2.jpg/ 
v1/fill/w_400,h_52,al_c,q_100,usm_0.66_1.00_0.01/Classic_Car_Painting By_Pavel_Hol% 


C3%BD.jpg 
Rules and manual efforts found in the document: https://tinyurl.com/86srq141. 
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BROWSELITE to request less for images its IMAGE FETCH REDUCTION 
component. In Section 5 we compare the effectiveness of the com- 
bined URL REwRITING and IMAGE FETCH REDUCTION components of 
BROWSELITE to existing approaches, e.g., middleboxes and Google 
Web Light [23], in terms of data-savings. 

While clearly requesting 50% of the bytes of an image can usher 
in 50% data-savings, what remains is the impact on semi-complete 
images on user QoE. Generally, the first X% of the bytes of Web 
images can be used to render the top most X% of the pixels of 
images. This implies that only the top half (approximately, given 
compression) of the images will be displayed given a 50% range 
request, making most images appear broken. 

However, there are some factors which can alleviate this impact. 
First, progressiveness is a rendering mechanism of jpeg and png 
formats in which image data is encoded such that it is able to be 
rendered in layers as opposed to top-to-bottom. What this implies is 
that an image can be rendered in its entirety, albeit at a lower quality 
(SSIM), given only a fraction of the available bytes. As discussed 
in Section 3, much of the potential data-savings for images come 
from the fact that they are sent at a larger size than they will be 
rendered at the client. Due to downsizing, when they arrive at the 
client certain progressive images can be rendered at high quality 
even with a fraction of their payloads requested. 

For non-progressive images, when applying IMAGE FETCH REDUC- 
TION, BROWSELITE performs a visual trick to make pages appear 
less visually broken while still attaining meaningful savings. This 
technique takes the partial data of the image that is obtained over 
the network, and fills in the broken gaps of empty space with re- 
flected and blurred content of the image, which we call reflections. 
The idea stems from a popular technique on the Web known as 
image previews [27]. This is a technique employed by many popular 
Web services (e.g., Facebook [35] and Medium [48]) where pages 
display small and blurred portions (on the order of bytes) of im- 
ages before the full versions are downloaded, as opposed to empty 
spaces or placeholders. However, since BROWSELITE does not have 
access to server-side control, and thus full image data, we cannot 
pre-process the images offline to make previews. Instead, we use 
the partial data in the range request to make reflections on the fly, 
at the client. 

Figure 5 shows a visualization of IMAGE FETCH REDUCTION in 
case of regular (a) and progressive images (b). For Figure 5 (a), 
both images show the page with 50% the image data requested. 
When reflection is applied (left most), the page is 95% visually 
complete (according to the SpeedIndex metric, see Section 5.2 for 
more details) and only 73% visually complete without reflection. 
Figure 5 (b) shows a page with 15% (left most) and 80% (right most) 
of the image data requested, respectively. Given the progressive 
image is sent over the network at dimensions much larger than 
its final rendered ones, the page is still 95% visually complete with 
only 15% of the data fetched. 


4.3 BROWSELITE 


We implement URL REwRITING and IMAGE FETCH REDUCTION as a 
Puppeteer [8] application. While we test with Chromium version 83, 
our application can function out of the box on any Chromium based 
browser or one in which supports the Chrome Debugging Protocol 
(e.g., Brave, Edge, and Opera [21]). Further, while BRowsELITE is 
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TPR  FPR(404) FPR(No Savings) Savings 
15.4% 6.1% 0.6% 66.2% 
3.7% 1.2% 0.01% 53.7% 
3.9% 0.1% 0.01% 44.1% 
16.1% 6.6% 0.6% 69.9% 


Figure 4: Quantifying the effectiveness of instrumenting image services for better image optimizations from the client. Savings 
is in terms of average reduction in size of these images. While a small fraction of pages are able to be rewritten in this way, 


the relative savings is quite large. 


prototyped as an external application, many of the components use 
internal browser APIs. We discuss the potential for BRowsELITE to 
be fully integrated with the browser in Section 6. 

To perform URL REwriTING, BROwSELITE intercepts all HTTP 
requests associated with images as defined by the Chromium net- 
work stack. Each request URL is associated with a DOM node of 
the Web page which can be used to extract the image’s CSS width 
and height as needed for URL RewriTine. Next, the actual request 
URL is run through a regex to reconfigure the URL parameters to 
fetch a lighter version of the image, if possible, in the same manner 
as discussed in Section 3 and Table 4. 

Moving to IMAGE FETCH REDUCTION, since BROWSELITE does 
not assume server cooperation, it does not know the file size of 
an image apriori, which is needed to form a range request. For 
this reason, BROWSELITE assigns a range header to instead request 
the first 2KB of every image. Contained in the server’s response 
is the image’s full size in bytes, and the metadata for the image. 
This metadata is immediately passed to the browser since it is used 
to facilitate the final layout of the page [10], which should not be 
delayed. Once the full image size is known, a second range request 
is immediately issued to obtain the lighter version of the image as 
a fraction of its known total size. 

Following the request procedure, the resultant image is built 
in memory using the concatenated data from the first and second 
range requests to prevent wasted bandwidth. To display the im- 
age from memory, the image is in-lined in the Web page using a 
dataURI [24] on the associated DOM element which was previ- 
ously obtained. The metadata from the initial 2KB of the image 
is used to determine, on the fly, its progressiveness. If the image 
is progressive, the dataURI simply consists of the data requested 
over the network. If the image is not progressive, the image data is 
decoded, reflected, blurred, and re-encoded to a final dataURI. 

Finally, fallback cases are necessary for both URL REwRITING 
and IMAGE FETCH REDUCTION components of BROWSELITE. If the 
initial 2KB request returns a 404, then URL RewriTINc is aborted 
and only IMAGE FETCH REDUCTION is used. In case of a false positive, 
e.g., the returned image is bigger than the original, BRowsELITE 
can only proceed to IMAGE FETCH REDUCTION. For IMAGE FETCH 
REDUCTION, range request support is determined on the fly if the 
server does not respond with the expected initial 2KB, or does not 
return the expected response headers to notify BROWSELITE of the 
image’s total size. In these cases, the full image, after having been 
subjected to URL REwriTING, is downloaded in its entirety. As 
discussed before, BRowSELITE only sees about 7% false positives 


when URL REwriTING, and only 4% of servers do not support range 
requests needed for IMAGE FETCH REDUCTION. 


5 RESULTS 


We move to evaluate BRowsELITE in terms of its impact on user QoE, 
data-savings, and page load performance. We also compare the data- 
savings of BROWSELITE to the less private middlebox optimizations 
of Section 3 and Google Web Light. 


5.1 Bandwidth Savings and Visual Trade-offs 


We begin by analyzing the potential data-savings offered through 
URL REwniTING. Figure 6 shows the CDF of the fraction of page 
bytes saved, per the URLs in each bucket of our crawls (Section 3). 
The figure shows that URL RewRriTING can only be used for 10-20% 
of the URLs in the two higher ranked popular buckets (top100 and 
apr50k), while it has little effect on less popular URLs (apr100k). 
This is because higher ranked pages are more likely to make use of 
image services which URL REwriTING opportunistically exploits. 

For the pages where URL REwriTINc can be used, it offers signif- 
icant data savings. For example, 5% of pages in apr50k and top100 
see savings of 20-30% or 400KB saved per page, on average. With 
respect to user experience, images have a median SSIM value of .97 
(see Figure 1(c)), a negligible quality reduction which is not de- 
tectable by a user. These high SSIM values are observed because 
these images are rewritten using standard means of compression. 

Next, we quantify data savings and QoE impact of IMAGE FETCH 
REDUCTION. While SSIM is useful to quantify visual impacts made 
to transformations that effect the entire contents of an image (e.g., 
blurring and color compression), it was not intended to reference 
quality of full images against partially complete images, as are 
produced by IMAGE FETCH REDUCTION. We instead quantify visual 
impact for pages using the visual completeness, a component of 
SpeedIndex [16, 45] which is a Web performance metric describing 
the average time in which Web pages are rendered. Visual com- 
pleteness is the comparison mechanism used by SpeedIndex to 
determine the fraction of a loading Web page that is rendered at a 
given point in time, allowing it to reasonably measure the visual 
impact of partially complete images and pages. Under the hood, the 
visual completeness compares color histograms from screenshots 
of the Web page at points in its load to a screenshot of its fully ren- 
dered state. The visual completeness at a given time is the fraction 
of pixels with colors matching the final state. 
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Figure 5: The visual completeness of (a) reflection and (b) 
progressiveness of images. The left pair shows the state of 
the page with 50% of the image data requested. The page 
with reflection is still 93% visually complete while the page 
without is only 78% virtually complete. The right pair shows 
the state of a page with a large progressive image. The page 
is already 95% visually complete with only 15% of the data 
requested (99% with 80% requested). 


Figures 7 (a) and 7 (b) compare the visual completeness values 
of pages with various amounts of image data requested, in 10% 
increments, relative to the pages with 100% of the image data re- 
quested. Each point on the CDF represents the visual completeness 
reached by 25%, 50%, and 75% of pages. Figure 7 (a) pictures all 
pages together, and (b) pages with a supermajority (>=60%) of pro- 
gressive images (~11% of pages in our dataset). We can observe that 
across all pages, the median page is still 90% visually complete with 
only 50% of data requested. Likewise, pages in the 75th percentile 
are 90% visually complete with only 30% of the data requested. 
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Figure 6: Page savings recovered by manipulation of server- 
side compression via URL REwRITING. As ~16% images are 
optimized, savings are shown from the 80th percentile. 


The median page with a supermajority of progressive images re- 
mains 90% visually complete even with only 10-15% of the image 
data requested. 

To expand on this result in terms of data saved, Figures 7 (c) 
and (d) show CDFs of savings across pages assuming various levels 
of image data requested — distinguishing between all pages (c) 
and the subset of pages hosting a supermajority of progressive 
images (d). Savings for general pages in (c) are shown for 25%, 50%, 
and 75% of the image data requested. Given Figure 7 has shown 
that, for progressive images, IMAGE FETCH REDUCTION can be more 
aggressive and reach higher visual completeness, savings in (d) are 
shown for 10%, 25%, and 50% of image data fetched. We can observe 
that the median page sees a 28% reduction in size by requesting 50% 
of image data. This jumps to 42% savings for 25% of image data 
requested. The progressive pages saw ~ 40% savings with 10% data 
requested while remaining at least 90% visually complete. 


5.2 User Experience 


While visual completeness is a useful proxy for quantifying the 
impact BRowsELITE has on the user experience at scale, it is not a 
substitute for feedback from real users. Thus, we performed a user 
study to investigate how BRowsELITE affects the end user. 

We selected 40 pages with regular images and 10 pages with su- 
permajority progressive images to show users in a crowdsourcing 
study, run via the Microworkers [7] platform. Our study was a sim- 
ple Web page that contained screenshots of the first viewport of the 
pages compressed via BRowSELITE. For the screenshots, the IMAGE 
FETCH REDUCTION component of BROWSELITE was configured such 
that 50% of the image data was requested across all pages. The 40 
regular pages were chosen randomly from 4 buckets of visual com- 
pleteness to their original counterparts. These buckets were chosen 
based on the distribution of pages in Figure 7 (a), ie, VC >= 95%, 
90% <= VC < 95%, 85% <= VC < 90% and VC < 85% for an average 
visual completeness of 89%. The 10 progressive images had mean 
visual completeness of 97%. On the study Web page, we showed the 
users the compressed version of each page with BROwSELITE side 
by side with the original page. The formal question we asked to 
users was, ‘How would you rate the quality of this compressed page 
which can extend your mobile data plan so you can browse more?’. 
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Figure 7: CDFs of savings offered by BROWSELITE as well as the tradeoff in visual rendering quality. Plots (a) and (b) quantify 
visual completeness values against the percent of image data requested. The plots picture all pages and only those with a 
supermajority of progressive images respectively. Plots (c) and (d) convey normalized savings for the same, broken up by 
three levels of image data requested. For (d) the levels show less data requested, since pages with progressive images are more 


visually complete with the same amount of data requested. 


We provided the user a quality scale from 1-5 with meanings for 
each choice given in Figure 8. 

For comparison, we also showed users screenshots of the same 
pages optimized with the Google Web Light tool [23] (see Section 2). 
While Web Light acts as a good upper bound for what can truly be 
done in terms of data-savings, it is a less private approach, and not 
easily quantified in terms of Web-compat. We showed 41 total Web 
Light pages (by forcefully navigating pages through Web Lights 
servers at http://googleweblight.com/i?u=URL [23, 54], as 9 of the 
pages in our set of 50 could not be optimized by Web Light and 
simply redirected to the original page, a behavior documented by 
recent analyses [54]. Not including such 9 pages, the Web Light 
pages had mean visual completeness of 73%. 

Participants to our study were shown 20 of the 50 pages; for 
each of the 20, either our page or Web Light version (if available) 
was randomly shown, to avoid bias by having users compare two 
of the same URLs during the study. Two controls were also shown, 
one with a perfect rendition of a page (visual completeness of 100), 
and one with a page where all images were replaced by image 
placeholder icons (visual completeness of 29). We only accepted 
users who rated these as 4-5 and 1-2, respectively, which filtered out 
approximately 35% of user responses to our study. We collected rat- 
ings from 200 Microworkers users (after filtering) giving each page 
approximately 40 ratings. We provide a link to an anonymous video 
of the study > which shows both pages optimized by BRowsELITE 
and by Web Light, as seen by our Microworker users. 

Figure 8 shows the median rating of each page calculated over the 
~40 user ratings collected per page. For BRowWSELITE, we distinguish 
between the pages with supermajority progressive images and 
pages where the reflection trick was used (see Section 4.2). From 
Figure 8 we draw a few observations. The first is that users rated all 
10 progressive pages very highly (mostly very good and few good), 
corroborating that progressive images can see higher data-savings 
for the same trade off in the user experience, discussed in Figure 7. 

The second is that the majority of pages with reflected images 
(21 out 40) were rated as usable for most users. Though none of 
the pages were rated as broken, about 20% were given a poor rating. 


3https://streamable.com/e89yji 


Upon further investigation of these pages, we observed that this 
rating was typically given if a) human faces were distorted or b) 
actual text embedded within these images was reflected (and thus 
unreadable). Conversely, pages with text overlaid on the image, 
and not part of the image data where rated quite positively (4 or 5). 
While we do not take means to prevent such cases in BROWSELITE, 
we believe it to be a good direction for future work (see Section 6). 
Further, the only page that received a median score of 1 was our 
control page with broken placeholder images. This result suggests 
that reflections are generally more favorable for the user experience 
than placeholder images, a technique used by Chromium under its 
Data-Saving mode images [40]. 

Finally, Figure 8 shows that Web Light, while removing and 
rearranging much of the page contents and having a much lower 
visual completeness on average, was rated generally highly by 
users (21 good pages). However, as Google’s servers have access 
to the contents before they are sent to the client, Web Light has 
more context, and time, to analyze the important parts of the page, 
factors not available to BRowsELITE. 


5.3 Range Requests and Caching 


One caveat in applying range requests to save data is the potential 
impact on caching. For example, let’s assume a user of BROWSELITE 
loads a page using a cellular connection, and then later loads the 


Broken Poor Usable Good Very Good 





Method 

(1) (2) (3) (4) (5) 
BROWSELITE 8 21 8 3 
(Reflections) 
BRowseLITE 0 0 3 7 
(Progressive) 
Web Light 0 2 11 21 7 


Figure 8: User responses form our crowdsourced user study. 
The rows represent the type of optimization: BROowsELITE 
on pages with regular images, BROWSELITE on the progres- 
sive image pages, and Web Light optimized pages. 
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Figure 9: The CDF in (a) shows SpeedIndex is inflated 
by 70ms for the median page on Wi-Fi and 25ms on 4G. The 
boxplots in (b) show the relative contributions from each 
source of delay caused by BRowsELITE on the SpeedIndex. 


same page under a Wi-Fi connection, where no data-saving is neces- 
sary. Ideally, the portions of the images fetched using range requests 
are not re-fetched when opting out of BRowsELITE. 

The heuristics that browsers employ for caching content ranges 
are not well documented. We thus set up an experiment on Chromium 
(version 83) to determine how caching rules for range requests are 
currently handled. First, we instrument Chromium to fetch real 
pages and images from our study under a cold cache. Next, we 
perform the following experiments: 


(1) We issued two successive range range requests for images with 
range of 0-10KB and 10-20KB and found they were both cached 
on subsequent requests, suggesting range requests of the same 
range are cacheable. 

(2) We issued two range requests with overlapping ranges and 
found that the browser rewrites the range request to only fetch 
the remaining data, e.g., a request for 0-10KB of an image 
following a range request for 0-20KB of the same was rewritten 
to request the range not yet in the cache: 10-20KB. 

(3) We issued a request for the full image following a range request 
and found that no data was wasted. For example first request- 
ing 0-10KB of an image, followed by requesting the full image, 
only results in the remaining portion of the full image being 
requested by the browser. 

(4) The reverse of the above is also true: we issued a request for 
the full image followed by a range request for 0-10KB of the 
same and found the range request to be served from the cache. 


These results add to the realism of a data-saving approach that 
leverages range requests, in that switching between BRowSELITE 
and normal browsing does not adversely affect HTTP caching, and 
hence waste data. This result further implies: a) flexible ranges can 
be used, say, if network conditions change, and b) BRowsELITE can 
be turned off, presumably by reading existing functionalities like 
the HTTP Save-Data header [31] with no excess data waste. 


5.4 Performance 


BROWSELITE was designed with data-savings rather speed as its 
main goal. However, to be usable, it is still important that pages are 
not significantly slowed down, which we investigate here. 
BROWSELITE introduces a few extra operations to page loads 
which can potentially slow them down. The first is the extra range 
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Figure 10: CDF of data-savings of middlebox approaches and 
Google Web Light compared to those of BRowsELITE. 


request used to fetch image metadata. Second is the search in the 
DOM to associate an image request to an object on the page. Third 
is the extra processing needed to create the reflections of non- 
progressive images. However, the data saved by BRowsELITE has 
the potential to offset such overhead. Note that while the extra 
request given an erroneous rule from URL REwRITING also adds 
some latency, this occurs relatively infrequently, as denoted by 
Figure 4. 

In order to quantify these additions to a Web page load, we mea- 
sure the average overhead in rendering time of pages as given by 
the SpeedIndex [16] metric. Figure 9 shows the CDF of change in 
SpeedIndex when loading pages normally and with BRowsELITE 
across the ~1,200 pages of our crawls. We compare across two 
network conditions, a Wi-Fi connection (https://fast.com reported 
40Mbps up, 10Mbps down, 10ms RTT) and a Verizon 4G LTE connec- 
tion (4Mbps up, 3Mbps down, 40ms RTT). We benchmark BRowSELITE 
with a visual completion budget of 90%, implying that 50% of the 
images are requested (see Figure 7). 

We can observe that ~80% of pages on Wi-Fi experience over- 
heads of <500 ms. Further, while 20% of pages experience a more 
noticeable delay (>500 ms), 41-48% of pages actually see an im- 
proved SpeedIndex by an average of ~400ms. While 4G connections 
have higher RTT than Wi-Fi, implying the extra range requests of 
BROWSELITE are further delayed, they also have lower bandwidth, 
and thus benefit more from BrowsELITE’s data savings. The result 
is that the distribution of change in SpeedIndex over WiFi and 4G 
are quantitatively similar. The improvement in SpeedIndex over 
both connections is due to the fact that some pages have images 
that actually render sooner since only 50% of the image content 
needs to be requested. While the fact that 20% of pages experience 
noticeable slowdowns is significant, we note that the primary objec- 
tive of BROWSELITE is to save bandwidth, and we expect users will 
tolerate a slight delay in their pages in exchange for data savings. 

For the pages that see increased SpeedIndex, in Figure 9 we 
analyze the 3 main causes of overhead from BRowsELITE for their 
relative attributions to the increases. From this, we can observe 
that the largest fraction of overhead is indeed due to the extra RTT 
(causing 50% of the inflation in the median page), followed closely 
by transformation times using the browser’s native image canvas 
APIs (45% in the median page). The DOM search is relatively fast in 
comparison (5% in the median page). The contributions to overhead 
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were similar between the Wi-Fi and 4G experiments, with the extra 
range request taking up 5% more overhead on 4G than on Wi-Fi. In 
section 6 we provide a few potential ways the transformation and 
search overheads can be minimized going forward, mainly through 
tighter implementation of BRowsELITE in the browser. 


5.5 Comparisons 


Finally, we compare the data-savings of BRowsELITE — the com- 
bined IMAGE FETCH REDUCTION and URL REWRITING components — 
to the bandwidth savings made available through middlebox ap- 


proaches as well as those offered by Google Web Light. For BRowsELITE, 


we target a visual completeness budget of at least 90% and thus 
assume that 50% of image data should be fetched. For middleboxes, 
we used the savings for the standard mode of our measurements 
Section 3. For Web Light pages, we navigated all the pages from 
our crawls through the Web Light system and derived savings as 
compared to the original versions of the same pages. All savings 
are in terms of relative page weights saved, including hot caches of 
inner-pages as described in Section 3. 

Figure 10 shows the CDF of data-savings offered by BRowsELITE, 
middlebox approaches that act as (a) MITM proxies, and (b) HTTP 
only proxies, and Google Web Light, across the pages of our crawls. 
The figure shows that Web Light acts as an upper bound for po- 
tential savings on pages, with the median page seeing up to 90% 
savings. As noted in Section 2, since all content is served through 
Web Light’s servers, more resources (outside images), and even the 
actual style of the page, can be directly manipulated. This comes 
not only at the cost of privacy concerns, but also significant impact 
on Web-compat; from our experiments, pages served via Web Light 
achieve only 60% visual completeness to their originals, on average. 
Web Light also failed to optimize ~10% of pages, as was observed 
for our user study and existing work [54]. 

While middlebox approaches see about 4% less savings compared 
to BROwWSELITE at the median (21.6% vs 25.4%), the upper percentiles 
see up to 30% more savings. While middlebox approaches are also 
limited to images, they intercept the content before reaching the 
client, allowing for more optimization opportunities at the cost of 
privacy concerns similar to those of Web Light. Further, if we look 
at only HTTP images from our dataset (we attempted HTTPS con- 
nections for all pages), the median savings of middlebox approaches 
drops to 0%, and only 20% at the 90th percentile, suggesting ~62% 
less pages are available to be optimized by not intercepting TLS. 


6 DISCUSSION 


This section discusses some subtle privacy concerns of BROWSELITE 
as well as complexities of an in-browser implementation, along with 
some future work based on results from our user study. 


Privacy considerations. While BRowsELITE is designed with 
privacy in mind, one subtle privacy concern lies in the caching of 
range requests. The current version of Chromium modifies range 
requests based on information from its cache in order to only fetch 
the next required portion of the range (see Section 5). Since re- 
sources in the HTTP cache can be hit across domains, this implies 
that a range request initiated on one domain can be resumed on 
another, thus leaking information on what other sites have been 
previously visited. 
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This attack, known as a cross site leak or XS-Leak [52], is not 
specific to range requests. Many browsers have begun discussing 
the implementation of (or have already implemented in the case 
of Safari [26, 30]) dual-key caching. This policy prevents access to 
cross-origin resources from the HTTP cache, with main intent of 
stopping XS-Leaks. As this feature will also prevent such XS-leaks 
with range requests, we expect the IMAGE FETCH REDUCTION feature 
of BRowsELITE to remain available and safe for the future. 

While BrowsELITE is implemented entirely client side for pri- 
vacy, techniques for savings, such as IMAGE FETCH REDUCTION and 
more curated rules for URL REwRITING, could be implemented pri- 
vately at a CDN. However, the fact that 80% of landing pages and 
60% of internal pages are not leveraging CDNs implies a client-side 
intervention is currently of import [15]. 


Browser implementation. One concern for the adoption of BRowsELITE 


is the performance impact of >500ms for 20% of pages (Figure 9(a)). 
While the current version of BROWSELITE is implemented as a pup- 
peteer application, a native implementation in the browser has the 
potential to eliminate overhead caused by image processing (for 
reflections) and DOM searches, which combined account for up to 
50% of overhead (see Figure 9(b)). A native implementation can as- 
sociate DOM elements directly with network requests, eliminating 
the need for a DOM search after the initial range request. Further, 
native use of image libraries (e.g., libpng, libwebp) bundled with 
the browser will allow for faster reflections, compared with our 
current use of the high level canvas APIs and conversion of the im- 
ages to dataURIs. If BrowsELITE were to see adoption, we believe 
these to be the next steps in its performance improvement. We do 
not perceive a way to avoid the extra range request to discover 
image metadata, which is a required step for BROoWSELITE to work 
properly. 

User Studies and Quality of Experience. From our user study 
analyses, we made the observation that reflections were poorly 
rated when IMAGE FETCH REDUCTION resulted in images with dis- 
torted faces or text. In the future, we wish to test these hypotheses 
with additional user studies. However, if true, what can be done to 
alleviate this impact on the user experience remains in question. 
One possibility we deem worth exploring is to use facial or textual 
detection models (e.g., via CNNs [38]) on the partially downloaded 
image to identify presence of such features. Upon successful de- 
tection, an extra range request can be issued to better complete 
such images. Another approach is experimentation with context 
encoders [47], which can potentially complete our partially ren- 
dered images with no additional data cost. Either way, trade-offs in 
terms of the load times and bandwidth implications these proposed 
techniques may have on page loads will need to be explored. 


7 CONCLUSION 


Given the increased complexity and size of webpages, there has 
been an enormous effort on the part of the web community to reduce 
data costs that the modern web now has on end users. However, 
existing approaches all trade off user privacy or web compatibility 
in exchange for such data savings. This paper presents BRowSELITE, 
a private data saving solution that focuses on optimization of im- 
ages during browsing. BROwSELITE reduces the data strain images 
place on the network by reducing their data requirements, through 
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auto-configuring image services and by replacing standard HTTP 
requests for images with range requests. As shown through our 
experimentation, BROWSELITE is able to achieve 25% data savings 
on the median webpage, with only a minor overhead on the page 
load time. Further, BRowsELIrz is able to reduce data requirements, 
while keeping the median webpage usable, as reported by real users. 
In future work, we plan to look at a tighter implementation of 
BROWSELITE in modern browsers and to explore the effects of a 
more advanced image processing pipeline to further reduce the 
potentially negative impacts on the end-user experience. 
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